arena: rename inner-SIMD-align knob and drop default 64 -> 32 by evaleev · Pull Request #552 · ValeevGroup/tiledarray

evaleev · 2026-05-21T09:12:22Z

Summary

Two related changes to the ArenaTensor in-cell alignment knob in src/TiledArray/tensor/arena_tensor.h:

Rename TILEDARRAY_INNER_SIMD_ALIGN → TILEDARRAY_ARENATENSOR_SIMD_ALIGN (and kInnerSimdAlign → kArenaTensorSimdAlign), so the knob's name reflects the type whose layout it parametrizes. Hard cut — no compat alias, since there are no external users yet.
Bump the default 64 B → 32 B. 32 B covers AVX2 YMM (the most common x86_64 SIMD target today) and shaves 32 B off data_offset per inner cell. AVX-512 builds that want the wider floor stay one -DTILEDARRAY_ARENATENSOR_SIMD_ALIGN=64 away.

Why this matters: each ArenaTensor cell pads from sizeof(Cell) (~14 B for btas::zb::RangeNd<>) up to this alignment before its element storage, so per-inner-cell bookkeeping is data_offset + 8 B view ptr. On a ToT tile with millions of inner cells (e.g. PNO-CCSD), the difference between 32 B and 64 B padding is order ~100s of MB of memory.

The doc comment now spells out the reasonable overrides:

64 — AVX-512 ZMM (and the x86_64 cache line)
16 — NEON-only / Apple Silicon (NEON has no wider register, and Apple Silicon doesn't implement SVE)
128 — two-cache-line / Apple-Silicon L1-line floor (false-sharing motivation only)

Test plan

arena_suite, arena_kernels_suite, arena_einsum_unit_suite, arena_tot_trivial_suite, arena_sizeof_invariant_suite, arena_tensor_suite, arena_tensor_kernels_suite all pass against the new default (np=1, debug build, TA_ASSERT_POLICY=TA_ASSERT_THROW).
CI (np=1 + np=2, full matrix) — let CI run.
MPQC consumer build with TA repinned to this commit — done out-of-tree.

…4 -> 32 Rename the in-arena element-storage alignment knob to TA_ARENATENSOR_SIMD_ALIGN (matching the TA_ prefix used by every other TA CMake option/var), and wire it through the same CMake -> config.h.in -> header pipeline as TA_MAX_SOO_RANK_METADATA. The previous TILEDARRAY_INNER_SIMD_ALIGN was a header-only #ifndef/#define knob with no CMake surface; the new form is a proper cache variable, documented in INSTALL.md. Drop the default from 64 B to 32 B: 32 B covers AVX2 YMM loads/stores (the most common x86_64 SIMD target today) and shaves 32 B/cell off the in-arena padding. AVX-512 builds that want a wider floor are one `-DTA_ARENATENSOR_SIMD_ALIGN=64` away. The doc comment and INSTALL.md entry also call out the NEON / Apple-Silicon options (16 / 128). No backward-compatible alias for the old macro/constant names -- there are no external users yet.

evaleev force-pushed the evaleev/arena/rename-and-bump-simd-align-default branch from cee791e to 3da31da Compare May 21, 2026 09:51

evaleev merged commit 600c4ad into master May 21, 2026
9 checks passed

evaleev deleted the evaleev/arena/rename-and-bump-simd-align-default branch May 21, 2026 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arena: rename inner-SIMD-align knob and drop default 64 -> 32#552

arena: rename inner-SIMD-align knob and drop default 64 -> 32#552
evaleev merged 1 commit into
masterfrom
evaleev/arena/rename-and-bump-simd-align-default

evaleev commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evaleev commented May 21, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant